Goto

Collaborating Authors

 cartoon animation


PrimeComposer: Faster Progressively Combined Diffusion for Image Composition with Attention Steering

Wang, Yibin, Zhang, Weizhong, Zheng, Jianwei, Jin, Cheng

arXiv.org Artificial Intelligence

Image composition involves seamlessly integrating given objects into a specific visual context. The current training-free methods rely on composing attention weights from several samplers to guide the generator. However, since these weights are derived from disparate contexts, their combination leads to coherence confusion in synthesis and loss of appearance information. These issues worsen with their excessive focus on background generation, even when unnecessary in this task. This not only slows down inference but also compromises foreground generation quality. Moreover, these methods introduce unwanted artifacts in the transition area. In this paper, we formulate image composition as a subject-based local editing task, solely focusing on foreground generation. At each step, the edited foreground is combined with the noisy background to maintain scene consistency. To address the remaining issues, we propose PrimeComposer, a faster training-free diffuser that composites the images by well-designed attention steering across different noise levels. This steering is predominantly achieved by our Correlation Diffuser, utilizing its self-attention layers at each step. Within these layers, the synthesized subject interacts with both the referenced object and background, capturing intricate details and coherent relationships. This prior information is encoded into the attention weights, which are then integrated into the self-attention layers of the generator to guide the synthesis process. Besides, we introduce a Region-constrained Cross-Attention to confine the impact of specific subject-related words to desired regions, addressing the unwanted artifacts shown in the prior method thereby further improving the coherence in the transition area. Our method exhibits the fastest inference efficiency and extensive experiments demonstrate our superiority both qualitatively and quantitatively.


TF-ICON: Diffusion-Based Training-Free Cross-Domain Image Composition

Lu, Shilin, Liu, Yanzhu, Kong, Adams Wai-Kin

arXiv.org Artificial Intelligence

Text-driven diffusion models have exhibited impressive generative capabilities, enabling various image editing tasks. In this paper, we propose TF-ICON, a novel Training-Free Image COmpositioN framework that harnesses the power of text-driven diffusion models for cross-domain image-guided composition. This task aims to seamlessly integrate user-provided objects into a specific visual context. Current diffusion-based methods often involve costly instance-based optimization or finetuning of pretrained models on customized datasets, which can potentially undermine their rich prior. In contrast, TF-ICON can leverage off-the-shelf diffusion models to perform cross-domain image-guided composition without requiring additional training, finetuning, or optimization. Moreover, we introduce the exceptional prompt, which contains no information, to facilitate text-driven diffusion models in accurately inverting real images into latent representations, forming the basis for compositing. Our experiments show that equipping Stable Diffusion with the exceptional prompt outperforms state-of-the-art inversion methods on various datasets (CelebA-HQ, COCO, and ImageNet), and that TF-ICON surpasses prior baselines in versatile visual domains. Code is available at https://github.com/Shilin-LU/TF-ICON


Buddhist humanoid in Beijing has 'shaved' head, chants mantras and chats to visitors

Daily Mail - Science & tech

A Buddhist temple on the outskirts of Beijing has decided to ditch traditional ways and use technology to attract followers. Longquan temple has developed a robot monk that can chant Buddhist mantras, move via voice command, and hold a simple conversation. Named Xian'er, the 2ft-tall (60cm) robot resembles a cartoon-like novice monk in yellow robes with a shaven head, holding a touchscreen on its chest. Longquan temple says it has developed a robot monk that can chant Buddhist mantras, move via voice command, and hold a simple conversation. Named Xian'er (pictured), the 2ft-tall (60cm) robot resembles a cartoon-like novice monk in yellow robes with a shaven head, holding a touch screen on its chest Xian'er can hold a conversation by answering around 20 simple questions about Buddhism and daily life, listed on its screen, and perform seven types of motions on its wheels.